48 research outputs found
Enhancing Security Patch Identification by Capturing Structures in Commits
With the rapid increasing number of open source software (OSS), the majority
of the software vulnerabilities in the open source components are fixed
silently, which leads to the deployed software that integrated them being
unable to get a timely update. Hence, it is critical to design a security patch
identification system to ensure the security of the utilized software. However,
most of the existing works for security patch identification just consider the
changed code and the commit message of a commit as a flat sequence of tokens
with simple neural networks to learn its semantics, while the structure
information is ignored. To address these limitations, in this paper, we propose
our well-designed approach E-SPI, which extracts the structure information
hidden in a commit for effective identification. Specifically, it consists of
the code change encoder to extract the syntactic of the changed code with the
BiLSTM to learn the code representation and the message encoder to construct
the dependency graph for the commit message with the graph neural network (GNN)
to learn the message representation. We further enhance the code change encoder
by embedding contextual information related to the changed code. To demonstrate
the effectiveness of our approach, we conduct the extensive experiments against
six state-of-the-art approaches on the existing dataset and from the real
deployment environment. The experimental results confirm that our approach can
significantly outperform current state-of-the-art baselines
EasyNet: An Easy Network for 3D Industrial Anomaly Detection
3D anomaly detection is an emerging and vital computer vision task in
industrial manufacturing (IM). Recently many advanced algorithms have been
published, but most of them cannot meet the needs of IM. There are several
disadvantages: i) difficult to deploy on production lines since their
algorithms heavily rely on large pre-trained models; ii) hugely increase
storage overhead due to overuse of memory banks; iii) the inference speed
cannot be achieved in real-time. To overcome these issues, we propose an easy
and deployment-friendly network (called EasyNet) without using pre-trained
models and memory banks: firstly, we design a multi-scale multi-modality
feature encoder-decoder to accurately reconstruct the segmentation maps of
anomalous regions and encourage the interaction between RGB images and depth
images; secondly, we adopt a multi-modality anomaly segmentation network to
achieve a precise anomaly map; thirdly, we propose an attention-based
information entropy fusion module for feature fusion during inference, making
it suitable for real-time deployment. Extensive experiments show that EasyNet
achieves an anomaly detection AUROC of 92.6% without using pre-trained models
and memory banks. In addition, EasyNet is faster than existing methods, with a
high frame rate of 94.55 FPS on a Tesla V100 GPU
TransRepair: Context-aware Program Repair for Compilation Errors
Automatically fixing compilation errors can greatly raise the productivity of
software development, by guiding the novice or AI programmers to write and
debug code. Recently, learning-based program repair has gained extensive
attention and became the state-of-the-art in practice. But it still leaves
plenty of space for improvement. In this paper, we propose an end-to-end
solution TransRepair to locate the error lines and create the correct
substitute for a C program simultaneously. Superior to the counterpart, our
approach takes into account the context of erroneous code and diagnostic
compilation feedback. Then we devise a Transformer-based neural network to
learn the ways of repair from the erroneous code as well as its context and the
diagnostic feedback. To increase the effectiveness of TransRepair, we summarize
5 types and 74 fine-grained sub-types of compilations errors from two
real-world program datasets and the Internet. Then a program corruption
technique is developed to synthesize a large dataset with 1,821,275 erroneous C
programs. Through the extensive experiments, we demonstrate that TransRepair
outperforms the state-of-the-art in both single repair accuracy and full repair
accuracy. Further analysis sheds light on the strengths and weaknesses in the
contemporary solutions for future improvement.Comment: 11 pages, accepted to ASE '2
Real3D-AD: A Dataset of Point Cloud Anomaly Detection
High-precision point cloud anomaly detection is the gold standard for
identifying the defects of advancing machining and precision manufacturing.
Despite some methodological advances in this area, the scarcity of datasets and
the lack of a systematic benchmark hinder its development. We introduce
Real3D-AD, a challenging high-precision point cloud anomaly detection dataset,
addressing the limitations in the field. With 1,254 high-resolution 3D items
from forty thousand to millions of points for each item, Real3D-AD is the
largest dataset for high-precision 3D industrial anomaly detection to date.
Real3D-AD surpasses existing 3D anomaly detection datasets available regarding
point cloud resolution (0.0010mm-0.0015mm), 360 degree coverage and perfect
prototype. Additionally, we present a comprehensive benchmark for Real3D-AD,
revealing the absence of baseline methods for high-precision point cloud
anomaly detection. To address this, we propose Reg3D-AD, a registration-based
3D anomaly detection method incorporating a novel feature memory bank that
preserves local and global representations. Extensive experiments on the
Real3D-AD dataset highlight the effectiveness of Reg3D-AD. For reproducibility
and accessibility, we provide the Real3D-AD dataset, benchmark source code, and
Reg3D-AD on our website:https://github.com/M-3LAB/Real3D-AD
Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning
According to the World Health Organization(WHO), it is estimated that
approximately 1.3 billion people live with some forms of vision impairment
globally, of whom 36 million are blind. Due to their disability, engaging these
minority into the society is a challenging problem. The recent rise of smart
mobile phones provides a new solution by enabling blind users' convenient
access to the information and service for understanding the world. Users with
vision impairment can adopt the screen reader embedded in the mobile operating
systems to read the content of each screen within the app, and use gestures to
interact with the phone. However, the prerequisite of using screen readers is
that developers have to add natural-language labels to the image-based
components when they are developing the app. Unfortunately, more than 77% apps
have issues of missing labels, according to our analysis of 10,408 Android
apps. Most of these issues are caused by developers' lack of awareness and
knowledge in considering the minority. And even if developers want to add the
labels to UI components, they may not come up with concise and clear
description as most of them are of no visual issues. To overcome these
challenges, we develop a deep-learning based model, called LabelDroid, to
automatically predict the labels of image-based buttons by learning from
large-scale commercial apps in Google Play. The experimental results show that
our model can make accurate predictions and the generated labels are of higher
quality than that from real Android developers.Comment: Accepted to 42nd International Conference on Software Engineerin
Trends and Patterns of Disparities in Burden of Lung Cancer in the United States, 1974-2015
Background: Although lung cancer incidence and mortality have been declining since the 1990s, the extent to which such progress has been made is unequal across population segments. Updated epidemiologic data on trends and patterns of disparities are lacking.Methods: Data on lung cancer cases and deaths during 1974 to 2015 were extracted from the Surveillance, Epidemiology, and End Results program. Age-standardized lung cancer incidence and mortality and their annual percent changes were calculated by histologic types, demographic variables, and tumor characteristics.Results: Lung cancer incidence decreased since 1990 (1990 to 2007: annual percent change, −0.9 [95% CI, −1.0%, −0.8%]; 2007 to 2015: −2.6 [−2.9%, −2.2%]). Among adults aged between 20 and 39 years, a higher incidence was observed among females during 1995 to 2011, after which a faster decline in female lung cancer incidence (males: −2.5% [−2.8%, −2.2%]; females: −3.1% [−4.7%, −1.5%]) resulted in a lower incidence among females. The white population had a higher incidence than the Black population for small cell carcinoma since 1987. Black females were the only group whose adenocarcinoma incidence plateaued since 2012 (−5.0% [−13.0%, 3.7%]). A higher incidence for squamous cell carcinoma was observed among Black males and females than among white males and females during 1974 to 2015. After circa 2005, octogenarians and older patients constituted the group with the highest lung cancer incidence. Incidence for localized and AJCC/TNM stage I lung cancer among octogenarians and older patients plateaued since 2009, while mortality continued to rise (localized: 1.4% [0.6%, 2.1%]; stage I: 6.7% [4.5%, 9.0%]).Conclusions: Lung cancer disparities prevail across population segments. Our findings inform effective approaches to eliminate lung cancer disparities by targeting at-risk populations
From market to device : adaptive and efficient malware detection for Android
In the past few years, the market share ratio of Android System has been increased to a leading position. With that large user basis, the number of Android applications on Google Play has increased to 3 million till the year of 2018. However, not all of the applications in market can be surely prevented from security risks. API misuse and incorrect invocation by developers may cause significant data leakage or tangibly degrade user experience, etc. Meanwhile, due to the complexity of Android system and diversity of real usage scenarios, it is quite a challenge to solve all these problems within a strait forward way. Thus, we set our targets on providing solutions for the Android security problems towards different usage scenarios separately.
As we know, a precise representation for attacks can benefit the detection of malware in both accuracy and efficiency. However, it is still far from expectation to describe attacks precisely on the Android platform. In addition, new features on Android, such as communication mechanisms, introduce new challenges and difficulties for attack detection. Considering to solve the addressed problems by the side of service provider and security researcher, we propose abstract attack models to precisely capture the semantics of various Android attacks, which include the corresponding targets, involved behaviors as well as their execution dependency. Meanwhile, we construct a novel graph-based model called ICCG (Inter-component Communication Graph) to describe the internal control flows and inter-component communications of applications. The models take into account more communication channel with a maximized preservation of their program logics. With the guidance of the attack models, we propose a static searching approach to detect attacks hidden in ICCG. To reduce false positive rate, we introduce an additional dynamic confirmation step to check whether the detected attacks are false alarms. Experiments show that our integrated malware detection system, DroidEcho, can detect attacks in both benchmark and real-world applications effectively and efficiently with a precision of 89.5%.
However, apart from the applications provided by the official market (i.e., Google Play Store), which can adopt a heavy and complicated detection approach (e.g., DroidEcho), apps from unofficial markets and third-party resources are always causing serious security threats to end-users. Meanwhile, it is a time-consuming task if the app is downloaded first and then uploaded to the server side for detection, because the network transmission has a lot of overhead. In addition, the uploading process also suffers from the threat of attackers. Consequently, a last line of defense on mobile devices is necessary and much-needed.
To address this problem, we propose an effective Android malware detection system, MobiTive, leveraging customized deep neural networks to provide a real-time and responsive detection environment on mobile devices. MobiTive is a pre-installed solution rather than an app scanning and monitoring engine using after installation, which is more practical and secure. Although a deep learning-based approach can be maintained on server side efficiently for malware detection, original deep learning models cannot be directly deployed and executed on mobile devices due to various performance limitations, such as computation power, memory size, and energy. Therefore, we evaluate and investigate the following key points: (1) the performance of different feature extraction methods based on source code or binary code; (2) the performance of different feature type selections for deep learning on mobile devices; (3) the detection accuracy of different deep neural networks on mobile devices; (4) the real-time detection performance and accuracy on different mobile devices; (5) the potential based on the evolution trend of mobile devices' specifications; and finally we further propose a practical solution (MobiTive) to detect Android malware on mobile devices.
Based on the evaluations and findings on MobiTive, we find that syntax features, such as permissions and API calls, lack the semantics which can represent the potential malicious behaviors and further result in more robust model with high accuracy for malware detection. We further propose an efficient Android malware detection system, named SeqMobile, which adopts behavior-based sequence features and leverages customized deep neural networks on mobile devices instead of the server end. Different from the traditional sequence-based approaches on server end, to meet the performance demand on mobile devices, SeqMobile accepts three effective performance optimization methods to reduce the time of feature extraction and prediction. To evaluate the effectiveness and efficiency of our system, we conduct experiments from the following aspects 1) the detection accuracy of different recurrent neural networks (RNN); 2) the feature extraction performance on different mobile devices, and 3) the detection accuracy and prediction time cost of different sequence lengths. The results unveil that SeqMobile can effectively detect malware with high accuracy. Moreover, our performance optimization methods have proven to improve the performance of training and prediction by at least twofold. Additionally, to discover the potential performance optimization from the state-of-the-art TensorFlow model optimization toolkit for our sequence-based approach, we also provide an evaluation on the toolkit, which can serve as a guidance for other systems leveraging on sequence-based learning approach. Overall, we conclude that our sequence-based approach, together with our performance optimization methods, enable us to efficiently detect malware under the performance demands of mobile devices.Doctor of Philosoph
Region-by-Region Registration Combining Feature-Based and Optical Flow Methods for Remote Sensing Images
While geometric registration has been studied in remote sensing community for many decades, successful cases are rare, which register images allowing for local inconsistency deformation caused by topographic relief. Toward this end, a region-by-region registration combining the feature-based and optical flow methods is proposed. The proposed framework establishes on the calculation of pixel-wise displacement and mosaic of displacement fields. Concretely, the initial displacement fields for a pair of images are calculated by the block-weighted projective model and Brox optical flow estimation, respectively in the flat- and complex-terrain regions. The abnormal displacements resulting from the sensitivity of optical flow in the land use or land cover changes, are adaptively detected and corrected by the weighted Taylor expansion. Subsequently, the displacement fields are mosaicked seamlessly for subsequent steps. Experimental results show that the proposed method outperforms comparative algorithms, achieving the highest registration accuracy qualitatively and quantitatively